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PRELIMINARY AMENDMENT 



Sir: 



Prior to the examination of the above-identified application, please 
amend the application as follows and consider the following remarks: 

IN THE TITLE 

The title of the present application has been amended to read: 

A METHOD AND APPARATUS FOR COMPUTING A PACKED SUM 

OF ABSOLUTE DIFFERENCES 

IN THE SPECIFICATION 



Please insert the following on page 1, line 4: 
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CROSS-REFERENCE TO RELATED APPLICATIONS 

This application is a continuation of application Ser. No. 09/052,904, 
filed March 31, 1998, currently pending. 

Please replace the table on page 20, lines 1-3 with: 
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Table 1 



ABSTRACT 

Please substitute the Abstract on page 34, lines 2-10 with the following: 
A method and apparatus is disclosed that computes multiple absolute 
differences from packed data and sums each one of the multiple absolute 
differences together to produce a result. According to one embodiment, a 
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processor includes a decode unit to decode a packed sum of absolute 
differences (PS AD) instruction having an opcode format to identify a set of 
packed data operands. The decode unit initiates a sequence of operations on 
the set of packed data operands in response to decoding the PSAD instruction. 
An execution unit performs a first operation of the sequence of operations 
initiated by the decode logic, and a bus provides the execution unit with the set 
of packed data operands as identified in accordance with the opcode format. 

IN THE CLAIMS 

Presented below are the amended claims in a clean, unmarked format. 

16. (New) A processor comprising: 

a decode unit to decode a plurality of packed data instructions including 
a packed sum of absolute differences (PSAD) instruction having a first format to 
identify a first set of packed data, and a packed multiply-add (PMAD) 
instruction having a second format to identify a second set of packed data, said 
decode unit to initiate a first set of operations on the first set of packed data 
responsive to decoding the PSAD instruction and to initiate a second set of 
operations on the second set of packed data responsive to decoding the PMAD 
instruction; and 

an execution unit to perform a first operation of the first set of operations 
initiated by the decode unit and to perform a second operation of the second set 
of operations initiated by the decode unit. 
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17. (New) The processor of Claim 16, wherein the decode unit further 
decodes a plurality of instructions of a PENTIUM microprocessor instruction 
set. 

18. (New) The processor of Claim 16, wherein the first set of operations 
comprises: 

a packed subtract and write carry (PSBWC) operation; 

a packed absolute value and read carry (PABSRC) operation; and 

a packed add horizontal (PADDH) operation. 

19. (New) The processor of Claim 16, wherein performing the first operation 
causes the execution unit to: 

subtract one of a plurality of elements of a first packed data of the first 
set of packed data from a corresponding one of a plurality of elements of a 
second packed data of the first set of packed data to produce a first result 
having a plurality of difference elements and a plurality of sign indicators. 

20. (New) The processor of Claim 19, wherein the first format identifies the 
first set of packed data as packed bytes. 

21. (New) The processor of Claim 16, wherein performing the first operation 
causes the execution unit to: 
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produce a first plurality of partial products in a multiplier having a 
plurality of partial product selectors; 

insert an element of a first plurality of elements of a first packed data 
into and substituting for bit positions of one or more of the first plurality of 
partial products by using partial product selectors corresponding to the bit 
positions; and 

add the first plurality of elements together to produce a first result 
including a field comprising a sum of the first plurality of elements, said field 
having a least significant bit. 

22. (New) The processor of Claim 21, wherein performing the first operation 
further causes the execution unit to: 

shift the first result to produce a second result having a least significant 
bit position and to align the least significant bit of the field with the least 
significant bit position of the second result. 

23. (New) The processor of Claim 21, wherein performing the second 
operation causes the execution unit to: 

produce a second plurality of partial products in the multiplier having 
the plurality of partial product selectors, the second plurality of partial 
products comprising four distinct sets of partial products including a first set of 
partial products corresponding to a first product for elements of the second set 
of packed data, a second set of partial products corresponding to a second 
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product for elements of the second set of packed data, a third set of partial 
products corresponding to a third product for elements of the second set of 
packed data, and a fourth set of partial products corresponding to a fourth 
product for elements of the second set of packed data; and 

add the first set of partial products together with the second set of partial 
products to produce a first distinct element of a packed result and add the third 
set of partial products together with the fourth set of partial products to 
produce a second distinct element of the packed result. 

24. (New) The processor of Claim 23, wherein the second format identifies 
the second set of packed data as packed words. 

25. (New) The processor of Claim 16, wherein performing the first operation 
causes the execution unit to: 

receive a plurality of difference elements and a plurality of sign 
indicators; 

produce a result data having a plurality of absolute value elements, each 
absolute value element produced by 

(a) subtracting one of the plurality of difference elements from a 
corresponding constant value if the sign indicator corresponding to that 
element is in a first state, or 

(b) adding one of the plurality of difference elements to a corresponding 
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constant value if the sign indicator corresponding to that element is in a second 
state. 

26. (New) A processor to execute instructions of the PENTIUM 
microprocessor instruction set, the processor comprising: 

decode logic to decode a packed sum of absolute differences (PSAD) 
instruction having a first format to identify a first set of packed data, said 
decode logic to initiate a first set of operations on the first set of packed data 
responsive to decoding the PSAD instruction; 

execution logic to perform a first operation of the first set of operations 
initiated by the decode logic; and 

a bus to provide the first set of packed data to the execution logic for 
performing of the first operation. 

27. (New) The processor of Claim 26, wherein the decode logic comprises a 
look-up table. 

28. (New) The processor of Claim 26, wherein the decode logic comprises 
integrated circuitry. 

29. (New) The processor of Claim 28, wherein the decode logic further 
comprises executable operations. 
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30. (New) The processor of Claim 29, wherein the decode logic comprises: 
a packed subtract and write carry (PSBWC) operation; 

a packed absolute value and read carry (PABSRC) operation; and 
a packed add horizontal (PADDH) operation. 

31. (New) The processor of Claim 26, wherein the first format identifies the 
first set of packed data as packed bytes. 

32. (New) The processor of Claim 31, wherein performing the first operation 
causes the execution logic to: 

subtract one of a plurality of elements of a first packed data of the first 
set of packed data from a corresponding one of a plurality of elements of a 
second packed data of the first set of packed data to produce a first result 
having a plurality of difference elements and a plurality of sign indicators; and 

store the plurality of difference elements and the plurality of sign 
indicators. 

33. (New) The processor of Claim 26, wherein performing the first operation 
causes the execution logic to: 

produce a first plurality of partial products in a multiplier having a 
plurality of partial product selectors; 

insert an element of a first plurality of elements of a first packed data 
into and substituting for bit positions of one or more of the first plurality of 
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partial products by using partial product selectors corresponding to the bit 
positions; and 

add the first plurality of elements together to produce a first result 
including a field comprising a sum of the first plurality of elements, said field 
having a least significant bit. 

34. (New) The processor of Claim 33, wherein performing the first operation 
further causes the execution logic to: 

shift the first result to produce a second result having a least significant 
bit position and to align the least significant bit of the field with the least 
significant bit position of the second result. 

35. (New) The processor of Claim 26, the decode unit to decode a packed 
multiply-add (PMAD) instruction having a second format to identify a second 
set of packed data, said decode unit to initiate a second set of operations on the 
second set of packed data responsive to decoding the PMAD instruction. 

36. (New) The processor of Claim 35, execution unit to perform a second 
operation of the second set of operations initiated by the decode unit. 

37. (New) The processor of Claim 35, wherein the second format identifies 
the second set of packed data as packed words. 
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38. (New) The processor of Claim 26, wherein performing the first operation 
causes the execution logic to: 

receive a plurality of difference elements and a plurality of sign 
indicators; 

produce a result data having a plurality of absolute value elements, each 
absolute value element produced by 

(a) subtracting one of the plurality of difference elements from a 
corresponding constant value if the sign indicator corresponding to that 
element is in a first state, or 

(b) adding one of the plurality of difference elements to a corresponding 
constant value if the sign indicator corresponding to that element is in a second 
state. 

39. (New) A processor comprising: 

decode logic to decode a packed sum of absolute differences (PSAD) 
instruction having a first format to identify a first set of packed data, said 
decode logic to initiate a first set of operations on the first set of packed data 
responsive to decoding the PSAD instruction, the first set of operations 
comprising: 

a packed subtract and write carry (PSUBWC) operation; 

a packed absolute value and read carry (PABSRC) operation; and 

a packed add horizontal (PADDH) operation.; and 
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execution logic to perform the first set of operations initiated by the 
decode logic. 



40. (New) The processor of Claim 39, wherein the first format identifies the 
first set of packed data as packed bytes. 

41 . (New) The processor of Claim 39, wherein performing the PSUBWC 
operation causes the execution logic to: 

subtract one of a plurality of elements of a first packed data of the first 
set of packed data from a corresponding one of a plurality of elements of a 
second packed data of the first set of packed data to produce a first result 
having a plurality of difference elements and a plurality of sign indicators; and 

store the plurality of difference elements and the plurality of sign 
indicators. 

42. (New) The processor of Claim 39, wherein performing the PABSRC 
operation causes the execution logic to: 

receive a plurality of difference elements and a plurality of sign 
indicators; 

produce a result data having a plurality of absolute value elements, each 
absolute value element produced by 

(a) subtracting one of the plurality of difference elements from a 
corresponding constant value if the sign indicator corresponding to that 
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element is in a first state, or 

(b) adding one of the plurality of difference elements to a corresponding 
constant value if the sign indicator corresponding to that element is in a second 
state. 

43. (New) The processor of Claim 39, wherein performing the PADDH 
operation causes the execution logic to: 

produce a first plurality of partial products in a multiplier having a 
plurality of partial product selectors; 

insert an element of a first plurality of elements of a first packed data 
into and substituting for bit positions of one or more of the first plurality of 
partial products by using partial product selectors corresponding to the bit 
positions; and 

add the first plurality of elements together to produce a first result 
including a field comprising a sum of the first plurality of elements, said field 
having a least significant bit. 

44. (New) The processor of Claim 43, wherein performing the PADDH operation 
further causes the execution logic to: 

shift the first result to produce a second result having a least significant bit 
position and to align the least significant bit of the field with the least significant bit 
position of the second result. 
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REMARKS 

If there are any additional charges, please charge Deposit Account No. 
02-2666. If a telephone interview would in any way expedite the prosecution of 
the present application, the Examiner is invited to contact Maria McCormack 
Sobrino at (408) 720-8300. 

Respectfully submitted, 

Blakely, Sokoloff, Taylor & Zafman LLP 



Dated: ^^mjJ* 



.,2001 



12400 Wilshire Blvd. 
Seventh Floor 

Los Angeles, CA 90025-1026 
(408) 720-8300 



Maria McCormack Sobrino 
Reg. No. 31,639 
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VERSION OF SPECIFICATION AND CLAIMS WITH MARKINGS: 



IN THE TITLE 

The title of the present application has been amended from "A METHOD 
AND APPARATUS FOR COMPUTING A SUM OF PACKED DATA 
ELEMENTS USING SIMD MULTIPLY CIRCUITRY" to -A METHOD AND 
APPARATUS FOR COMPUTING A PACKED SUM OF ABSOLUTE 
DIFFERENCES- 

IN THE SPECIFICATION 

On page 1, at line 4, please insert: 

- CROSS-REFERENCE TO RELATED APPLICATIONS 

This application is a continuation of application Ser. No. 09/052,904, 
filed March 31, 1998, currently pending.- 

On page 1, at line 5, please delete "1.". 

On page 1, at line 10, please delete "2.". 

On page 4, please delete lines 1-10. 

On page 20, please delete Table 1 as follows: 
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and replace with the following Table 1: 





Receives 


Generates 


PSUBWC/PABSR 
C arithmetic 
element 


Packed Data 
elements 


c 

input,i 


c 

OUtpuil 


Packed Data 
element 


1000 


D 0 and E 0 


c 

^input,0 


c 

output,0 


F 0 


1010 


u l ana h x 


ir»put,l 


outpuU 


F l 


1 non 
lUzU 


D and E 


c 

inputs 


r 

outputs 






D andE 


c 

inputs 


c 


F 

1 3 


1040 


D 4 and E 4 


c 

input/4 


c 

output/4 


f 4 


1050 


D 5 and E 5 


c 

inputs 


c 

outputs 


F 5 


1060 


D 6 andE 6 


c 

input,6 


c 

output,6 


F< 


1070 


D 7 and E 7 


c 

input,7 


c 

output,7 


F 7 



Table 1 



IN THE CLAIMS 

Please delete claims 1-15. 

IN THE ABSTRACT 

Please substitute the Abstract on page 34, lines 2-10 with the following: 
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-ABSTRACT 



A method and apparatus is disclosed that computes multiple absolute 
differences from packed data and sums each one of the multiple absolute 
differences together to produce a result. According to one embodiment, a 
processor includes a decode unit to decode a packed sum of absolute 
differences (PSAD) instruction having an opcode format to identify a set of 
packed data operands. The decode unit initiates a sequence of operations on 
the set of packed data operands in response to decoding the PSAD instruction. 
An execution unit performs a first operation of the sequence of operations 
initiated by the decode logic, and a bus provides the execution unit with the set 
of packed data operands as identified in accordance with the opcode format- 
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